Goto

Collaborating Authors

 nonverbal communication


React to This (RTT): A Nonverbal Turing Test for Embodied AI

Zhang, Chuxuan, Etesam, Yasaman, Lim, Angelica

arXiv.org Artificial Intelligence

We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment. In 1950, Turing [1] proposed the "imitation game" as a way to address the question: "Can machines think?" Since then, numerous attempts have been made to pass this test [2]. One of the earliest systems to highlight how surface-level language mimicry could deceive users was ELIZA [3], developed in 1965.


Augmented Body Communicator: Enhancing daily body expression for people with upper limb limitations through LLM and a robotic arm

Zhou, Songchen, Armstrong, Mark, Barbareschi, Giulia, Ajioka, Toshihiro, Hu, Zheng, Ando, Ryoichi, Yoshifuji, Kentaro, Muto, Masatane, Minamizawa, Kouta

arXiv.org Artificial Intelligence

Individuals with upper limb movement limitations face challenges in interacting with others. Although robotic arms are currently used primarily for functional tasks, there is considerable potential to explore ways to enhance users' body language capabilities during social interactions. This paper introduces an Augmented Body Communicator system that integrates robotic arms and a large language model. Through the incorporation of kinetic memory, disabled users and their supporters can collaboratively design actions for the robot arm. The LLM system then provides suggestions on the most suitable action based on contextual cues during interactions. The system underwent thorough user testing with six participants who have conditions affecting upper limb mobility. Results indicate that the system improves users' ability to express themselves. Based on our findings, we offer recommendations for developing robotic arms that support disabled individuals with body language capabilities and functional tasks.


Human-like Nonverbal Behavior with MetaHumans in Real-World Interaction Studies: An Architecture Using Generative Methods and Motion Capture

Chojnowski, Oliver, Eberhard, Alexander, Schiffmann, Michael, Müller, Ana, Richert, Anja

arXiv.org Artificial Intelligence

Socially interactive agents are gaining prominence in domains like healthcare, education, and service contexts, particularly virtual agents due to their inherent scalability. To facilitate authentic interactions, these systems require verbal and nonverbal communication through e.g., facial expressions and gestures. While natural language processing technologies have rapidly advanced, incorporating human-like nonverbal behavior into real-world interaction contexts is crucial for enhancing the success of communication, yet this area remains underexplored. One barrier is creating autonomous systems with sophisticated conversational abilities that integrate human-like nonverbal behavior. This paper presents a distributed architecture using Epic Games MetaHuman, combined with advanced conversational AI and camera-based user management, that supports methods like motion capture, handcrafted animation, and generative approaches for nonverbal behavior. We share insights into a system architecture designed to investigate nonverbal behavior in socially interactive agents, deployed in a three-week field study in the Deutsches Museum Bonn, showcasing its potential in realistic nonverbal behavior research.


Allo-AVA: A Large-Scale Multimodal Conversational AI Dataset for Allocentric Avatar Gesture Animation

Punjwani, Saif, Heck, Larry

arXiv.org Artificial Intelligence

The scarcity of high-quality, multimodal training data severely hinders the creation of lifelike avatar animations for conversational AI in virtual environments. Existing datasets often lack the intricate synchronization between speech, facial expressions, and body movements that characterize natural human communication. To address this critical gap, we introduce Allo-AVA, a large-scale dataset specifically designed for text and audio-driven avatar gesture animation in an allocentric (third person point-of-view) context. Allo-AVA consists of $\sim$1,250 hours of diverse video content, complete with audio, transcripts, and extracted keypoints. Allo-AVA uniquely maps these keypoints to precise timestamps, enabling accurate replication of human movements (body and facial gestures) in synchronization with speech. This comprehensive resource enables the development and evaluation of more natural, context-aware avatar animation models, potentially transforming applications ranging from virtual reality to digital assistants.


Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model

Petković, Uroš, Frenkel, Jonas, Hellwich, Olaf, Lazarides, Rebecca

arXiv.org Artificial Intelligence

This paper introduces a novel computational approach for analyzing nonverbal social behavior in educational settings. Integrating multimodal behavioral cues, including facial expressions, gesture intensity, and spatial dynamics, the model assesses the nonverbal immediacy (NVI) of teachers from RGB classroom videos. A dataset of 400 30-second video segments from German classrooms was constructed for model training and validation. The gesture intensity regressor achieved a correlation of 0.84, the perceived distance regressor 0.55, and the NVI model 0.44 with median human ratings. The model demonstrates the potential to provide a valuable support in nonverbal behavior assessment, approximating the accuracy of individual human raters. Validated against both questionnaire data and trained observer ratings, our models show moderate to strong correlations with relevant educational outcomes, indicating their efficacy in reflecting effective teaching behaviors. This research advances the objective assessment of nonverbal communication behaviors, opening new pathways for educational research.


Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

Bortoletto, Matteo, Ruhdorfer, Constantin, Shi, Lei, Bulling, Andreas

arXiv.org Artificial Intelligence

We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input. ToM is key for effective nonverbal human communication and collaboration, yet, existing methods for belief modelling have not included explicit ToM modelling or have typically been limited to one or two modalities. MToMnet encodes contextual cues (scene videos and object locations) and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person. Inspired by prior research on social cognition and computational ToM, we propose three different MToMnet variants: two involving fusion of latent representations and one involving re-ranking of classification scores. We evaluate our approach on two challenging real-world datasets, one focusing on belief prediction, while the other examining belief dynamics prediction. Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters. Taken together, our method opens up a highly promising direction for future work on artificial intelligent systems that can robustly predict human beliefs from their non-verbal behaviour and, as such, more effectively collaborate with humans.


Nonverbal Communication through Expressive Objects

Communications of the ACM

Augmentative and alternative communication (AAC) devices enable speech-based communication, but generating speech is not the only resource needed to have a successful conversation. Being able to signal one wishes to take a turn by raising a hand or providing some other cue is critical in securing a turn to speak. Experienced conversation partners know how to recognize the nonverbal communication an augmented communicator (AC) displays, but these same nonverbal gestures can be hard to interpret by people who meet an AC for the first time. Prior work has identified motion through robots and expressive objects as a modality that can support communication. In this work, we work closely with an AAC user to understand how motion through a physical expressive object can support their communication. We present our process and resulting lessons on the designed object and the co-design process. Augmented communicators (ACs) with motor disabilities that affect speech production may use augmentative and alternative communication (AAC) devices to speak. AAC devices include picture or letter boards that people can point to or speech-generating devices people can use to compose messages.2 Commercial speech-generating AAC systems are currently only customizable at the word selection and speech production levels, and they do not yet support augmentations that can increase non-verbal communication. Nonverbal communication is key in helping regulate turn-taking, convey personality, and execute actions that increase social agency,12 all of which are current challenges for ACs.15, 22 For instance, ACs are compelled to respond within the synchronous timing constraints of in-person interactions even though they use an asynchronous text-based medium.10 ACs have to compose a message on their device using text and then they share their message with text-to-speech while a non-augmented conversation partner responds synchronously using speech without needing to compose a message. Prior work identified motion-based AAC as a viable and under-explored modality for increasing ACs' agency in conversation.21 We build on this prior-work to dig deeper into a particular case study on motion-based AAC by co-designing a physical expressive object, or sidekick, to support ACs during conversations.


Nonverbal Cues in Human-Robot Interaction: A Communication Studies Perspective

Urakami, Jacqueline, Seaborn, Katie

arXiv.org Artificial Intelligence

Communication between people is characterized by a broad range of nonverbal cues. Transferring these cues into the design of robots and other artificial agents that interact with people may foster more natural, inviting, and accessible experiences. In this position paper, we offer a series of definitive nonverbal codes for human-robot interaction (HRI) that address the five human sensory systems (visual, auditory, haptic, olfactory, gustatory) drawn from the field of communication studies. We discuss how these codes can be translated into design patterns for HRI using a curated sample of the communication studies and HRI literatures. As nonverbal codes are an essential mode in human communication, we argue that integrating robotic nonverbal codes in HRI will afford robots a feeling of "aliveness" or "social agency" that would otherwise be missing. We end with suggestions for research directions to stimulate work on nonverbal communication within the field of HRI and improve communication between human and robots.


Admoni

AAAI Conferences

In typical human interactions, nonverbal behaviors such as eye gazes and gestures serve to augment and reinforce spoken communication. To use similar nonverbal behaviors in human-robot interactions, researchers can apply artificial intelligence techniques such as machine learning, cognitive modeling, and computer vision. But knowledge of nonverbal behavior can also benefit artificial intelligence: because nonverbal communication can reveal human mental states, these behaviors provide additional input to artificial intelligence problems such as learning from demonstration, natural language processing, and motion planning. This article describes how nonverbal communication in HRI can benefit from AI techniques as well as how AI problems can use nonverbal communication in their solutions.


Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Fan, Lifeng, Qiu, Shuwen, Zheng, Zilong, Gao, Tao, Zhu, Song-Chun, Zhu, Yixin

arXiv.org Artificial Intelligence

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In contrast, such crucial social characteristics are mostly missing in the existing scene understanding literature. In this paper, we incorporate different nonverbal communication cues (e.g., gaze, human poses, and gestures) to represent, model, learn, and infer agents' mental states from pure visual inputs. Crucially, such a mental representation takes the agent's belief into account so that it represents what the true world state is and infers the beliefs in each agent's mental state, which may differ from the true world states. By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents. This "five minds" model differs from prior works that infer beliefs in an infinite recursion; instead, agents' beliefs are converged into a "common mind". Based on this representation, we further devise a hierarchical energy-based model that jointly tracks and predicts all five minds. From this new perspective, a social event is interpreted by a series of nonverbal communication and belief dynamics, which transcends the classic keyframe video summary. In the experiments, we demonstrate that using such a social account provides a better video summary on videos with rich social interactions compared with state-of-the-art keyframe video summary methods.